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WEIGHTED FIREFLY OPTIMIZATION (IWFFO) 
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Abstract — Heart related disease is the 
significant cause for short life of humans. 
People in large population country depend on 
healthcare industry, that's why people need 
accurate test result in short time. In healthcare 
industry, very huge amount of information is 
formed in daily large. A Random Forest 
algorithm with IWFFO proposed for the heart 
disease prediction. The overall performance of 
the proposed method was compared with the 
prior Support Vector Machine (SVM) with 
Recursive Feature Elimination. Total number of 
14 attributes from Cleveland heart disease 
dataset are selected. They are age, sex, cp, 
trestbps, chol, fbs, restecg,thalach, exang, 
oldpeak, slope, ca, thal, target. IWFFO 
selectscp, trestbps, chol, fbs, thalach, exang, 
old peak, ca, thal, target. IWFFO feature 
selection method applies to НЕ algorithm 
which achieved high accuracy 98.7?6 when 
compare to SVM algorithm. 


Keywords: Data mining, Heart Disease, 
Classification, Feature Selection, Dataset. 


1. INTRODUCTION 

The most important human part of the body is 
heart which circulated the blood throughout the 
body. The prominent causes of death happened 
due to cardiac disease. In a short oftime span the 
mortality rate has spiked. Cardiovascular 
diseases refer to these heart associated diseases. 
These diseases are seen more in the developing 
rather than developed countries as referred and 
suggested by. Inaccurate diagnosis of the disease 
may cause fatalities and hence precision and 
safety in diagnosing the heart disease would be 
the prime factor in healthcare practice. Hence, the 
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diagnosis of the disease is a multifarious problem. 
It should not be ridden by false presumptions and 
dubious side effects. There are variousconditions 
that affect the circulatory system. Hence, heart 
disease alone cannot be specified as a single 
disease. 
2. SYMPTOMS 

Smoking, Obesity, Depression, Hyper tension, 
High blood cholesterol, Poor diet, Family history 
and Physical inactivity 
A. Types of Heart Disease 

e Coronary heart disease 
Angina pectoris 
Congestive heart failure 
Cardiomypathy 
Congenital heart disease 
Arrhythmias 
Myocarditis 
Heart attack 

e Heart cancer etc. 

For detecting cardiovascular problems into 
the patients, the prediction methods like SVM, 
decision tree and neural networks are compared 
based on the metrics. It was found that in predicting 
cardio vascular disease the SVM performed better 
than other methods suggested by [73]. for 
forecasting the existence of heart disease, 
performance comparison was performed with 
Recursive Feature Elimination (RFE) feature 
selection with SVM classifier, Random Forest 
(RF) and Intensity Weighted Firefly Optimization 
(IWFFO). 

3. LITERATURE REVIEW 
Murthy & Meenakshi (2014) [36] proposed 
Neuro-genetic model development to predict 
coronary heart diseases. The new presented 
research task work was feature subset selectionby 
multi-objective genetic algorithm deprived of 
previous ANN accuracy on the basis of heart 
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disease predictor. The intended designed Neuro- 
Genetic model pattern had been authenticated by 
means of 303 patient data sets attained for various 
age groups. The projected investigation had 
displayed prior heart disease detection with 
greater experimenting accuracy of 89.5896 via 
reduced feature subset, hence minimizing the 
difficulty. 

Alshurafa et а. (2014) [37] examined 
consequences froma Remote Health Monitoring 
(RHM) arranged in a six month Women's Heart 
Health research of 90 patients, and adopted 
superior feature selection and machine learning 
algorithms. This approach detects the patients' 
key baseline related features and constructed 
efficient prediction patterns to assistin determining 
RHM results success. Consequences from the 
Women's Heart Health research revealed this 
heart disease threat, life quality, family history, 
stress factors, social support, and anxiety at 
baseline each assisted to predict patient RHM 
result success. 

Gao et al. (2015) [38] introduced two novel 
features to predict cardiac arrest that are 
Approximate Entropy (ApEn) and Sample Entropy 
(SpEn). The dimensionality reduction concept, 
PCA, was adopted to  overwhelm the 
dimensionality. The results proved that this 
prediction performance of appending ApEn and 
SpEn to the 24 parameters was enhanced 
considerably when evaluated with just 24 
parameters. Dimensionality reduction had more 
positive impacts on enhancing the prediction 
consequences. 

Kaya & Pehlivan (2015) [89] discussed 
Premature Ventricular Contractions (PVCs) 
heartbeats classification from ECG signals and 
especially, on assessment performance of 
particular features by Genetic. 

Boshra Bahrami et al. [40]. The author discovered 
many approach to diagnosis the heart disease .The 
various classifiertechniques are the Decision Tree, 
Naive Bayes(NB),J48 K Nearest Neighbours 
(KNN) and SMO are used to classify dataset. The 
results are compared by the various Classification 
techniques by metrics such as the precision, 
specificity, area under ROC curve, accuracy, 
sensitivity3 and F-measure5. The results have 
been compared with the above algorithm and 
shows that J48 Decision tree is the best classifier 
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fordiagnosing the heart disease on the existing 
dataset. 


4. MACHINE LEARNING APPROACH 

Machine learning is an interdisciplinary study, 
which has focused on algorithm design and 
computers utilize these algorithm for learning 
purpose. Learning is nothing but learning from the 
feature dataset. Generally, machine learning 
techniques are designed and implemented in such 
way that, they have permitted the expert system to 
produce the solutionfor the diagnostic problem by 
using previous information. 

The different learning methods available for the 
classification task are supervised classification, 
unsupervised classification and reinforcement 
learning. The important application of machine 
learning is data mining. Classificationis one of the 
supervised learning processes апа the 
classification tool is used to predict the target 
class. 

A. CLASSIFICATION 

Classification is a common decision making job 
of human activities. Classification issues occur 
when objects are to be designated into pre- 
specified groups or classes on the basis of the 
quantity of noted features relating to that object. 
Several industrial issues are realized as 
classification issues. Forinstance, stock market 
predictions, weather forecasts, bankruptcy 
predictions, speech recognitions, character 
recognitions and so on and so forth. The 
classification issues may be resolved both in a 
mathematical as well as non-linearmanner. 

B. FEATURE SELECTION 

One of the significant preprocessing options is 
feature selection, which is mainly used in data 
mining applications. In medical field, information 
are becoming very huge that increase the difficulty 
in decision making. This, decrease the 
performance of the prediction result in disease 
forecasting process. 

Feature selection plays an important role in 
solving the scalability problems and improves the 
overall classification accuracy through removing 
the irrelevant features. It is the effective tool to 
choose the relevant feature from the medical 
dataset. Therefore, reducing unwanted features in 
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the dataset is a significant process in data 
mining to achieve a better accuracy. 

In this research, the dataset used for 
experimental setup was collected from UCI 
repository Cleveland database is selected for 
this research. 297 patients' data is taken this 
research and 14 attributes also included. The 
proposed system is developed in Java 
language. The Net Beans IDE is utilized for 
front end design. MYSQL is used for database 


access. 
Table 1 Feature information and class 


Age 


Numeric age in years 
Sex Numeric 
Ср Митепс 
Trestbps. i 


sex (1 = male; 0 = female) 


chest pain type 
-Value 1: typical angina 
Value 2: atypical angina 
-Value 3: non-anginal pain 
Value 4: asymptomatic 
resting blood pressure (in mm Hg on 
admission to the hospital 


al 
serum cholestoral in mg/dl 


Numeric (fasting blood sugar > 120 mg/dl) 
1 = true; 0 = false) 
Numeric resting electrocardiographic results 


Numeric 


Numeric maximum heart rate achieved 


Шығ ST depression induced by exercise 
relative to rest 


the slope of the peak exercise ST 


segment 
-Value 1: up sloping 
- Value 2: flat 

- Value 3: down sloping 
number of major vessels (0-3) 
colored by fluoroscop 
3 = normal; 6 = fixed defect; 
7 = [eye е defect 


Diagnosis of heart disease 
(angiographic disease status) 
— Value 0: « 50% diameter 


Numeric 


rrowii 
— Value 1: > 50% diameter 
narrowing 


5. RECURSIVE FEATURE ELIMINATION (RFE) 

In RFE process, entire dataset is split into 
various slots, and necessary protein is calculated 
for the given slot. The values which are duplicate 
are pruned first using RFE. Originally 2 slots are 
taken by dumping from one slot to the other. If any 
of the slots has more elements or values, then 
both the slot is equalized. Equation is used to 
measure side by side, the average for a given 
feature. The value that is nearestfrom the first slot 
to the mean average is known as the criticalgene 
for that slot. 
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In the present heart disease prediction system, 
the RFE calculates the llwll value to found the 
each attribute contributions. The w denotes the 
weight vector. The liwli? is considered as the 
ranking variable where the higher value of Ім/І2 
is considered as significant features by RFE. 

lwll2 = |* n a y @(х)|, |-1, 2...m 

Where 
Q (xij) - Occurrence of feature vector. 


This feature selection method comes under 
backward procedure. This follows the iterative 
practice to remove the unwanted attributes in the 
dataset. 

The working flow of the PFE is discussed below. 
Initially the model considered all the attributes/ 
features from the inputdataset. The ranking factor is 
computed for each attributes and selects the 
attributes with high rank which improves the 
performance of the model and the remaining 
attributes are considered as unwanted апа 
removed. These processes are continuous to 
construct the final model that fit the final feature 
selection result. The procedure get closed when it 
reaches thestopping criterion 
Support Vector Machines (SVM) 

SVM are typically utilized in patterns recognition 
as wellas objects recognition originally, given sets 
of points that are part of one of two classes; linear 
SVMs discover hyperplanes leaving biggest 
possible fraction of points of the same classes on 
the same side, and performing maximization of 
distance of either class from hyperplanes SVMs as 
classification methods have yielded improved 
classification outcomes when compared with the 
other typically utilized pattern techniques like 
maximum likelihood and neural network classifiers. 
It is typically beneficial for classifying 
remotely sensed information. 

SVM and other kernel methods were studied 
thus forming an optimized parameter bound by the 
number of training samples rather than the 
dimension of the feature space. Thus, kernel 
methods solve the dimensionality problem to a 
certain extent correlating with the samples; besides 
neuro imaging data put forth a high correlation 
between the feature needing a low dimensionality 
than the actual voxel and samples. 
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Moreover the advantages of feature selection 


outweigh the disadvantages. 
The advantages are 
a) Itspeeds up the testing process. 
b) It makes interpretation easy. 


C) Itis simple to evaluate the results of smaller 
classification 


subsets sufficient to maintain 


accuracies. 


6. PROPOSED METHODOLOGY 


The proposed methodology for heart disease 
using Data Mining process is described with the 
architecture as shown inFigure 3.8. The first step 
is the selection of the best features isperformed by 
Intensity Weighted Firefly Optimization (IWFFO) 
Then the classification process is performed to 
predict Congestive Heart Failure. 

A. Intensity Weighted Firefly Optimization 
(IWFFO) 

Feature selection is process of selecting 
useful feature from dataset. Firefly is a kind of a 
flash light which tries to communicate with the 
other members of their nature. As the intensity of 
light vanishes with respect to distant locations, its 
accuracy can be defined at local horizons for 
finding the best solution for any function. The 
fireflies are the particles or the extracted features 
from peak estimations. Each extracted feature 
(firefly) is assigned by light intensity and outof all 
the extracted features the distinct features which 
have common species are selected as the best 
one. This is best explained by the contours in which 
random regions are created based on the nature of 
features extracted and the particles of similar 
species are attracted towards the centre of the 
regionsof the contours. 

The random regions are created based on the 
feature categories and the particles of similar 
nature will follow theirown regions. Out of all the 
particles, some are in the centre ofthe regions and 
these are defined as the best features for better 
classification of disease. Hence fire fly optimization 
will serve the purpose of feature reduction 
technique by considering similar natured particles 
and neglecting the others. Let us consider TF as 
the feature vector or feature matrix. On selecting a 
training feature, Tr ' Define о, B and y with some 
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random values (here 0.2, LO and 1.0 are 
consideredrespectively). 

LetX = Xi (i = 23. п) 
Where 'n' is the number of particles and 'X' is 
the population of fire flies. 


Definel = rand (Tz) 


Where I denotes light intensity. Updating the observation 
coefficient as 


у= М(х; 一 Xj + (у.- V). 


Where i= 12] 3, ............ п, {=1,2, 3,............... m. Final 
updates are expressed as 
Xn = xp(i) + (1 — В) + xa(J) + B + a(rand 一 0.5) 
yn = УМ) * (1 — 8) + ya) + 8 + a(rand 一 0.5) 


When the light intensity gets updated after some iteration, 
the final values are indicated as 


А =I (х,у) ——» Exact fitness value 
id = min (ҒӘ -->  Exactbest fitness value 
T, = Tr (idx) 一 > Selected best feature 


7. RANDOM FORESTS 

Classification is a common decision making job 
of human activities. Classification issues occur 
when objects are to be designated into pre- 
specified groups or classes on the basis of the 
quantity of noted features relating to that object. 
Several industrial issues аге realized а 
classification issues. For instance, stock market 
predictions, weather forecasts, bankruptcy 
predictions, speech recognitions, character 
recognitions and so on and so forth. The 
classification issues may be resolved both in a 
mathematical as well as non-linearmanner. 

The Random Forest Tree (RFT) classifier is an 
ensembleclassification approach and it is the type 
of the nearest neighbor classifier method. Breiman 
stated that, the RFT develops more number of trees 
based on the random selection of the variables. 
During the learning stage of the classifier, tree nodes 
are separated using a random subset of data 
features. The RFT classifier works based on 
bagging concept, in whicheach successive tree is 
generated independently using bootstrap sample of 
the data items and classification of data items is 
done, which is based on majority vote. 
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The drawback in traditional decision tree 
approach was overcomes in НЕ. The RF 
overcomes the problems faced in the prior 
decision tree approaches. In RF, the 10 fold cross 
validation is considered as the default value. 


8. EXPERIMENTAL RESULT 


The original Cleveland dataset with 303 
instances is given as input to the SVM-RFE 
Algorithm. The results obtained have been 
tabulated in Figure 3.9. The dataset yields the 
improved weighted average values of Precision 
90%, Recall 89%, F- measure 94% and achieved 
the highest accuracy of 9495. 


TABLEIL TABLE EXPERIMENTAL RESULT OF PROPOSED METHOD 
RF-IWFFO 


Algorithm Precision (%) | Recall (%) | FMeasure | Accuracy (X) 
(%) 
SVML-RFE 90.3% 89.4% 945% "t 
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Fig. 1 Classification Algorithm of SVM-RFE 

The original Cleveland dataset with 303 
instances is given as input to the RF-IWFFO 
Algorithm. The results obtained have been 
tabulated in Figure 3.10. The dataset yields the 
improved weighted average values of Precision 
93%, Recall 90%, F-measure 98% and achieved 
the highest accuracy of 98%. Compared to SVM- 
RFE classifier, RF-IWFFO classifier obtains the 
highest accuracy of 98.7% in heart disease 
prediction. 


1196 


VoL.13 No.05 May 2023 


= 
=. 
а? 
с> 
n2 
= 
一 
a 
в 
2 
92 
а. 


Recall Frneasure Accuracy 


Performance Metrics 


Fig. 2.Classification Algorithm of RF-IWFFO 


9. CONCLUSION 

Cardiac diseases pose a serious threat. When 
arteries thatsupply oxygen and blood to the heart 
are completely blockedor narrowed, cardiac issue 
happens. Although huge amount ofdata is produced 
by the healthcare organizations, the data is not 
appropriately utilized. Among others, in making 
business decisions, classification problems of 
allocating various observations into distinct group 
play a significant role. SVM-RFE algorithm gives 
94.5% predictive accuracy, whereas Random 
forest-IWFFO algorithm gives 98.7% predictive 
accuracy. These results were obtained with less 
number ofiteration and it shows improvement from 
SVM-RFE paper. 
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